Epigenetic alterations affecting hematopoietic regulatory networks as drivers of mixed myeloid/lymphoid leukemia

Leukemias with ambiguous lineage comprise several loosely defined entities, often without a clear mechanistic basis. Here, we extensively profile the epigenome and transcriptome of a subgroup of such leukemias with CpG Island Methylator Phenotype. These leukemias exhibit comparable hybrid myeloid/lymphoid epigenetic landscapes, yet heterogeneous genetic alterations, suggesting they are defined by their shared epigenetic profile rather than common genetic lesions. Gene expression enrichment reveals similarity with early T-cell precursor acute lymphoblastic leukemia and a lymphoid progenitor cell of origin. In line with this, integration of differential DNA methylation and gene expression shows widespread silencing of myeloid transcription factors. Moreover, binding sites for hematopoietic transcription factors, including CEBPA, SPI1 and LEF1, are uniquely inaccessible in these leukemias. Hypermethylation also results in loss of CTCF binding, accompanied by changes in chromatin interactions involving key transcription factors. In conclusion, epigenetic dysregulation, and not genetic lesions, explains the mixed phenotype of this group of leukemias with ambiguous lineage. The data collected here constitute a useful and comprehensive epigenomic reference for subsequent studies of acute myeloid leukemias, T-cell acute lymphoblastic leukemias and mixed-phenotype leukemias.


CIMP vs AML/T-ALL
Methylation data (MCIP-seq) ChIP-seq at enhancers (X-axis) between CIMP and AML (le�) and T-ALL (right).The sta�s�cal significance of the comparisons bet ween these groups was determined by the Wald test (two-sided) in the DESeq2 package and corrected for mul�ple tes�ng with the Benjamini-Hochberg procedure.The values are the log10 of the false discovery rate (FDR) with the sign of the fold change.Genes with FDR < 0.05 and log2 fold change > 2 for both data types are colored (turquoise if hypermethylated, brown if hypomethylated).Those with non-significant changes are binned into grey hexagons whose opacity is propor�onal to the number of genes therein.Genes encoding for transcrip�on factors (TFs) are shown in solid color, among which those involved in hematopoiesis are highlighted in red (GO term: 0030097) and labeled.The rest of the genes are semitransparent.The Pearson correla�on coefficient (r) and its related pvalue (two-sided) for the rela�onship between methyla�on and expression are shown at the top le�.Note that a single gene may be targeted by mul�ple enhancers, each of which is labeled based on the distance with respect to the TSS.Sample sizes for H3K27ac ChIP-seq and RNA-seq data were, respec�vely: 9/13 (CIMP), 51/211 (AML) and 19/100 (T-ALL).Supplementary Figure 7. Locus overlap analyses of methyla�on data indicate altera�ons in binding sites for key hematopoie�c factors.a Bar plot depic�ng enrichment for experimentally confirmed TF binding sites from the ENCODE database at differen�ally methylated regions between CIMP and AML (le�) or between CIMP and T-ALL (right), as derived from EPIC array data.Enrichment was calculated with a two-tailed Fisher's exact test using the LOLA R package.The length of the bars corresponds to the odds ra�o and their color to the -log10(p-value); only a maximum of 25 results with a -log(p-value) above 50 are shown.TFs involved in hematopoiesis are highlighted in red.TFs involved in hematopoiesis (GO term: 0030097) are highlighted in red.b Same as a, but comparing CIMP and ETP-ALL and using both CODEX (top) and ENCODE (bo�om) databases to calculate enrichment.c Same as a, but using MCIP-seq data to iden�fy differen�ally methylated regions and the CODEX database to determine enrichment.d Same as c, but the enrichment was calculated with the ENCODE database.).e Genomic tracks displaying footprint scores calculated by TOBIAS in the regulatory regions of KLF4 and CEBPA, as well as mo�fs iden�fied in those genomic loca�ons.H3K27ac data are shown to indicate the presence of puta�ve enhancers or promoters, whereas MCIP reveals hypermethyla�on at inaccessible regions.f Correla�on between mo�f ac�vity as es�mated by chromVAR and TPM-normalized gene expression in every individual where both RNA-seq and ATAC-seq data were available (n=79).The sca�er plots show relevant TFs not included in Figure 5d; the full dataset is available in Supplementary Data 34.g Sca�er plot iden�fying differen�ally expressed regulators between CIMP and AML (le�) or T-ALL (right).The X-axis shows the correla�on coefficient between accessibility and gene expression across the en�re cohort (posi�ve regulators have high correla�on), whereas the Y-axis shows differen�al gene expression between two leukemia groups.The color points are significant posi�ve regulators, defined by rho ≥ 0.2 and log2(fold change) ≥ 1. h Heatmap displaying synergy between a subset of variable TFs, defined as the devia�on of chroma�n accessibility in peaks with both mo�fs rela�ve to peaks with only one mo�f.A high synergy score can indicate coopera�vity or compe��on between TFs.            roles in differentiation of genes like GATA2 39 or CEBPA 40 .As expected, binding sites for CEBP TFs were inaccessible in CIMPs, CEBPA DM AML and T-ALL.Supervised comparisons of motif activity revealed large differences in C/EBP between CIMP and AML, as well as between AML and T-ALL (Figure 5B, Supplementary Figure 8B).This suggests C/EBP activity might be critical for the loss of myeloid potential in CIMP leukemias, in line with previous reports of the essential function of CEBPA in establishing the myeloid trajectory 40 .Interestingly, other TFs were significantly more active in CIMP than in T-ALL, including SPI1 (PU.1), which induces myeloid commitment at high levels 41 and whose downregulation is necessary for terminal T-cell maturation 42 , but also BACH1, which promotes B-cell development at the expense of the myeloid lineage 43 .

CIMP vs T-ALL ΔTADs
In order to better define the relationship between TF expression and motif activity, we combined chromVAR deviations with gene expression of 496 TFs across 79 patients where both ATAC-seq and RNA-seq were available, including 9 CIMP, 51 AML and 19 T-ALL cases.At first glance, the motif activity of the 50 most variable TFs was concordant with their expression, though discrepancies were also apparent (Figure 5C).A possible explanation is the motif similarity of different TFs belonging to the same family, of which only a fraction may be relevant in a given cell type.The loss of accessibility at TFBS of the C/EBP family in CIMP relative to AML was accompanied by transcriptional repression of C/EBP genes, except for CEBPG, which acts as a negative modulator of its homologues (Figure 5C).Conversely, CIMP cases exhibited increased motif activity and expression compared to AML.When comparing CIMP against T-ALL, this integrative analysis revealed loss of motif activity and expression for TCF7, IRF4 and LEF1, whereas BACH2 exhibited gain of TFBS accessibility yet loss of expression.Of note, LEF1, which participates in the early stages of thymocyte maturation 44 , was among the few genes with promoter hypermethylation in supervised comparisons between CIMP and T-ALL.
In order to disambiguate which TFs are true regulators at their predicted binding sites in leukemia, we calculated the correlation between TFBS accessibility and gene expression, an approach commonly applied to single-cell technologies (Figure 5D, Supplementary Figure 8F, Supplementary Data 34) 45 .This correlation was particularly high for CEBPB and LEF1, which positively regulate the myeloid and the lymphoid program in AML and T-ALL respectively.On the other hand, a negative correlation was observed for KLF16 and ZBTB7B, suggesting they either act as repressors or compete against other members of the same family.Indeed, KLF proteins are known to compete against each other for promoters and enhancers 46 , whereas ZBTB7B functions as a repressor, particularly in T-cell differentiation 47 .Furthermore, integration of these correlation coefficients with changes in expression in CIMP revealed dysregulation of key regulators possibly driving their aberrant phenotype (Supplementary Figure 8G).Namely, T-cell regulators like GATA3 or LEF1 were overexpressed in CIMP compared to AML, whereas myeloid regulators such as SPI1, CEBPB or CEBPE were repressed.On the other hand, MEIS1 and HOXA13 were upregulated relative to T-ALL, with other genes like LEF1 and KLF4 exhibiting downregulation.Interestingly, ectopic expression of HOXA13 is associated with an immature phenotype and poor prognosis and is particularly frequent in ETP-ALL 48 .
Finally, some of the most variable motifs were evaluated for synergy (Supplementary Figure 8H).CTCF was highly antagonistic with KLF4, CEBPA, SPI1 and FOS, but positively associated with LEF1.Notably, both KLF4 49,50 and LEF1 51 are involved in structural reorganization of the genome, like CTCF, with KLF4 forming loops independently of CTCF.CEBPA and FOS were also strongly synergistic, in line with observations that JUN forms heterodimers with CEBPA that direct monocytic differentiation more potently than either of the two TFs alone 52 .However, high JUN expression has also been reported to inhibit CEBPA binding in AML, indicating a possible competitive behaviour as well 53 .

Supervised analyses of ATAC-seq and ChIP-seq data identify epigenetic signatures in line with gene expression patterns
To further understand the epigenetic makeup of CIMP leukemias, we conducted additional analyses of the ATAC-seq (Supplementary Data 35) and ChIP-seq (Supplementary Data 36) data mentioned above.Contrary to methylation and expression data, there were no large differences between CIMP and either AML or T-ALL in terms of both open chromatin (Supplementary Figure 10A) and H3K27ac deposition (Supplementary Figure 10D).Unexpectedly, however, chromatin was slightly more open at promoters in CIMPs relative to AML, despite the widespread hypermethylation leading to gene silencing.Supervised comparisons similarly revealed that even numbers of promoters and enhancers are either active or inactive in CIMP with respect to AML or T-ALL (Supplementary Figures 10B, 10E).
To draw more meaningful conclusions, we conducted GSEA on genes located in the vicinity of these variable peaks, ranked according to their differential binding in each comparison.CIMPs exhibited marked depletion of open chromatin at the targets of the PRC complex, i.e. regions marked by H3K27me3, relative to both AML and CD34+ cells (Supplementary Figure 10C).This is in keeping with the preferential DNA methylation at these same regions, which become increasingly closed as well.Remarkably, CIMPs displayed open chromatin at genes normally expressed in AML when compared to T-ALL, whereas genes involved in T-cell differentiation were closed.Similar observations were derived from GSEA on H3K27ac (Supplementary Figure 10F), but these same T-cell gene sets were more active than in CD34+ cells.Likewise, H3K27ac GSEA in CIMP vs AML revealed upregulation of T-cell sets and downregulation of AML ones.This is consistent with the epigenetic ambiguity of these leukemias and the notion that their differentiation is blocked at an intermediate stage, thus preventing a full commitment to either the T-lymphoid or the myeloid lineages.
Finally, we also generated H3K27me3 ChIP-seq for a few CIMP and AML samples to investigate whether the increases in DNA methylation are accompanied by increased deposition of H3K27me3.Interestingly, there was a slight decrease in global H3K27me3 levels (Supplementary Figure 10G), confirmed by differential analyses with DiffBind (Supplementary Figure 10H).Therefore, the establishment of DNA methylation does not require spreading of H3K27me3 to additional regions.GSEA revealed enrichment of H3K27me3 at genes involved in mitosis and NOTCH signalling and depletion at genes downregulated in various types of AML (NPM1-mutant, RUNX1-ETO fusions, etc.).However, none of these results were significant with FDR < 0.05, possibly due to the small size of the dataset.

Additional examples of altered 3D genome structure
In order to detect changes in 3D genome structure that led to alterations in gene expression, we identified variable TADs and loops that contained or overlapped differentially expressed genes.Most of these differences were observed between CIMP and AML, in line with the notion that CIMP leukemias may derive from a lymphoid progenitor.
We first investigated whether any TFs silenced by promoter methylation exhibited concomitant changes in chromatin interactions.Only three of these TFs were found in loops with decreased intensity in the CIMP group: PKNOX2, KLF4 (Figure 7G) and CEBPD (Figure 7H).In the latter two, the loss of interaction was accompanied by reduced CTCF binding and gain of methylation at the CEBPD and KLF4 promoters.On the other hand, 7 downregulated genes were found in ΔTADs, including again CEBPD and KLF4, but also other TFs like TAL1, IRF8 (Supplementary Figure 12F) and MAFB.In view of these observations, changes in 3D structure may contribute to silencing of TFs driving a differentiation block, but they are dispensable for a process that is largely driven by promoter silencing.
Next, we conducted an unbiased survey of ΔTADs (Supplementary Data 40) and DIs (Supplementary Data 41) with associated changes in CTCF binding and potential implications for gene expression.Aside from the examples described in the main text, the IRF8 TAD exhibits reduced insulation accompanied by a loss of CTCF binding on the right boundary.Similarly, the TAD containing GASK1B was partially lost, potentially leading to downregulation of this gene due to the lack of interaction with a proximal enhancer (Supplementary Figure 12G).On the other hand, the ANGPT2 TAD became strongly insulated, possibly resulting in upregulation of this gene (Supplementary Figure 12H).Gain of interaction between GATA3 and a putative enhancer that is only active in CIMP and not in AML possibly leads to overexpression of GATA3 (Figure 7I).Likewise, a loop involving the promoters of DNMT3B (Supplementary Figure 12I) and DNAJC1 (Supplementary Figure 12J) are also acquired in CIMP cases, possibly leading to overexpression of these genes.In contrast, the loss of interaction between IL6 and putative upstream enhancer elements may explain the downregulation of this gene, whose promoter was unmethylated (Supplementary Figure 12K).There were only 3 ΔTADs and DIs between CIMP and T-ALL with reduced CTCF binding, one of which was the loss of insulation at the TAD containing ASCC1 (Supplementary Figure 12L).This limited chromatin remodelling is in keeping with the notion that these leukemias originate from a lymphoid-biased cell.On the other hand, there were also fewer T-ALL replicates, which decreased the statistical power to detect such changes.The 8 differential enhancer-promoter loops between CIMP and T-ALL did not exhibit a clear correlation with gene expression (Figure 7F).Among those DIs was a loss of interaction between the promoter of the longer DIPK1A isoforms and downstream exonic regions of the same gene, which is downregulated in CIMPs (Supplementary Figure 12M).This downregulation is accompanied by a loss of H3K27ac at said promoter.Another example is the gained interaction between RCC2 and distal regulatory elements close to SDHB, which could contribute to the overexpression of that gene (Supplementary Figure 12N).

Loss of CEBPA plays a critical role in shaping the leukemic epigenome
Among the TFs downregulated by methylation was CEBPA, the loss of which was originally identified as the defining feature of the CIMP-EMC cohort 54 , also known as CEBPA-silenced.In line with the initial reports, a recurrent observation across the analyses of epigenomics data was that CIMP leukemias exhibited a profound similarity with double mutant AML (Supplementary Figures 2A-G).Double CEBPA mutations define an AML subtype (CEBPA DM) with a distinct gene expression profile, comparable to that of CIMP leukemias 54,55 .These patients typically exhibit a combination of N-and C-terminal mutations in the CEBPA protein that disrupt its normal function 56 .Moreover, CIMPs also clustered in the vicinity of AMLs with t(8;21), a chromosomal aberration that produces a RUNX1-RUNXT1 fusion protein, which inhibits the expression of CEBPA 57 .The similarity between epigenetic profiles of CEBPA DM AMLs and CIMP suggests that the loss of function of CEBPA, either by genetic or epigenetic hits, drives the acquisition of a distinct epigenetic and transcriptional landscape.
Furthermore, analysis of motif activity using chromVAR revealed that CEBPA and other members of the C/EBP family were among the top 30 TFs with the largest variability in chromatin accessibility across the whole cohort (Supplementary Figure 8A).Importantly, C/EBP TFs were among the few with a significant loss of activity in CIMP relative to AML whereas they displayed the largest increases of accessibility in AML compared to T-ALL (Figure 5B).Integration of gene expression data further confirmed that C/EBP family members are the only TFs showing simultaneous loss of expression and TFBS accessibility at significant levels (Figure 5E).Similarly, TF footprinting with TOBIAS identified various members of the C/EBP family as unbound in the CIMP group relative to AML (Supplementary Figure 8C).
Altogether, this underscores the importance of CEBPA as a critical determinant of cell identity and supports the notion that its loss in CIMP leukemias underlies their unique differentiation status.Of note, the +42-kb hematopoietic enhancer of CEBPA 58 is active in both CIMP and AML, but not in T-ALL (Supplementary Figure 13A-B).Therefore, it can be surmised that transformation takes place in a cell type that would normally be primed to express CEBPA, and thus exhibits some degree of multilineage priming.

Supplementary Figure 3 .
Copy number and fusion gene analysis.Copy number altera�ons (CNAs) detected by different algorithms are represented as a heatmap where red indicates a copy number gain and blue a copy number loss.a Reanalysis of WES data by CNV Radar (n=14).b Valida�on of results from WES data in input DNA-sequencing with Control-FREEC (n=14).c Sca�er plot showing CNVkit log2 copy number ra�os (grey dots) and segmenta�on calls (orange lines) in the NF1 locus for two CIMP cases where both tumor and healthy samples were available.Copy number ra�os are calculated as the depth of coverage in tumors rela�ve to controls, following normaliza�on.Segmenta�on calls group together genomic posi�ons that are likely have the same absolute copy number based on their copy number ra�os.d Oncoprint displaying fusion genes detected by at least 3 different so�ware tools and not commonly found in healthy individuals.Annota�ons indicate which fusion genes have been reported in different leukemia sequencing projects or fusion genes whose interac�ng partners are involved in leukemia.

Supplementary Figure 4 .
Analysis of transcrip�onal signatures suggests that CIMPs have an early cell of origin.a GSEA enrichment plot for the gene set Zhang ETP-ALL 2 in comparisons between CIMP and T-ALL (le�), AML (center) and CD34+ cells (right).Nominal p-values were calculated by a permuta�on test (n=10000) and adjusted for mul�ple comparisons with the Benjamini-Hochberg procedure.The false discovery rate (FDR) and the normalized enrichment score (NES) are indicated underneath.b Box plot displaying the cellular composi�on inferred by CIBERSORTx for each leukemia subgroup (or CD34+ cells) based on a signature derived from the Atlas of Human Blood Cells 3 (same dataset as in Figure2f).The analysis was performed on a mixture matrix containing RNA-seq raw counts from mul�ple leukemia subgroups.The lower and upper edges of the boxplots represent the first and third quar�les, respec�vely; the horizontal line inside the box indicates the median.The whiskers extend to the most extreme values within the range between the median and 1.5 �mes the interquar�le range.Lines between boxes show the p-value from a two-sided Wilcoxon test.c Heatmap displaying scores of single sample GSEA (ssGSEA) analysis with gene sets derived from various hematopoie�c frac�ons by the Dick group4 .The analysis was conducted on fragments per million (FPM)-normalized RNA-seq data from mul�ple leukemia subgroups, followed by row-wise Z-score normaliza�on.d Box plot showing the ssGSEA results from c. See b for a descrip�on of the box plot components and the sta�s�cal methodology.In both CIBERSORT and ssGSEA, scores were calculated for every sample and aggregated by disease groups: CIMP (n=13), AML (n=189), CEBPA double mutant (DM) AML (n=22), T-ALL (n=100).The CEBPA DM subgroup was analyzed separately from other AMLs owing to its similarity with CIMP leukemias.

Supplementary Figure 5 .
CIMP leukemias exhibit hypermethyla�on at regulatory regions.Unless otherwise specified, methyla�on was measured by MCIP-seq with the following groups and sample sizes: CIMP (n=13), AML (n=50), T-ALL (n=14) and CD34+ cells (n=3).Panels b, j and k also include data from healthy granulocytes, monocytes, T-cells and B-cells (n=2 each).a Box plots displaying methyla�on levels of leukemias and CD34+ cells at genomic regions of the Blueprint regulatory build5 .The lower and upper edges represent the first and third quar�les, respec�vely; the horizontal line indicates the median.The whiskers extend to the most extreme values within the range comprised between the median and 1.5 �mes the interquar�le range.b Same as a, but at genomic features defined rela�ve to gene features.c Average methyla�on levels in a 4-kb window around the center of puta�ve promoters.d Tornado plot depic�ng methyla�on levels at puta�ve promoters in CIMP leukemias and healthy cells.The HSPC tracks in purple were downloaded from ENCODE6 and show chroma�n accessibility (DNase-seq) as well as histone marks for enhancers (H3K4me1), promoters (H3K4me3), ac�va�on (H3K27ac) and repression (H3K27me3).GC density was downloaded from the UCSC browser7 .e Violin plot showing methyla�on levels of AML (n=272), T-ALL (n=119), ETP-ALL (n=24) and CIMP leukemias (n=5) at different genomic features measured by Methyla�onEPIC array.f Number of differen�ally methylated regions (DMRs) detected by supervised comparisons of Methyla�onEPIC array data.CpG sites with absolute mean beta value difference > 0.2 and FDR-adjusted pvalue < 0.05 were considered differen�ally methylated.g-i Dot plots of GO term enrichment on genes overlapping DMRs obtained from Methyla�onEPIC array data.j Average methyla�on levels in 1-kb windows around the center of H3K4me1 peaks, either overlapping H3K27me3 peaks (bivalent regions, right) or not (le�).k Box plots depic�ng the same data as in j, but from the central 100 bp of each peak.See a for a defini�on of the box plot components.l Summary of the top 5 most significant results of pre-ranked GSEA conducted on genes in the vicinity of DMRs between CIMP and T-ALL.The C2 (le�) and C5 (right) MSigDB collec�ons were used in the analysis.m Same as l, but showing enrichment in CIMP rela�ve to CD34+ cells.n Genomic tracks of MCIP-seq data for selected leukemia samples at gene promoters with significant changes in methyla�on.

Supplementary Figure 6 .
Integra�on of methyla�on and gene expression data reveals widespread silencing of transcrip�on factors in CIMP. a Heatmap displaying normalized methyla�on levels (MCIP-seq) at promoters of genes encoding for hematopoie�c TFs exhibi�ng both hypermethyla�on and loss of expression in CIMP with respect to AML (CIMP vs AML), T-ALL (CIMP vs T-ALL) or both (CIMP vs AML & T-ALL).The heatmap in the middle shows normalized expression levels (RNA-seq) of the same genes in leukemia cells and healthy HSPCs.The rightmost heatmap presents normalized gene expression (CAGE-seq) in different healthy cells.b Ji�er plots showing methyla�on (top) and expression (bo�om) of a few selected genes in CIMP, other leukemias and CD34+ HSPCs.The horizontal black lines represent pairwise comparisons between CIMP and other leukemias.Sta�s�cal significance was determined by a two-sided Wald test in the DESeq2 package and corrected for mul�ple tes�ng with the Benjamini-Hochberg procedure.Sample sizes for MCIP-seq and RNA-seq were, respec�vely: 13/13 (CIMP), 50/211 (AML), 14/100 (T-ALL).c Starbust plot depic�ng changes in gene expression (Y-axis) and H3K27ac

8 .
−0.1, p.val = 5.9e−01) −0.6, p.val = 7.9e−09) −0.6, p.val = 2.8e−09) −0.7, p.val = 3.8e−14) Es�ma�on of mo�f ac�vity based on open chroma�n data.a Transcrip�on factor mo�fs ranked by variability in chroma�n accessibility, as measured by chromVar using ATAC-seq data.The top 40 are shown as a zoom-in underneath.b Heatmaps displaying the individual-specific mo�f ac�vi�es, as Z-score, of the top 50 TFs with highest differen�al ac�vity between CIMP and AML (le�), CIMP and T-ALL (middle), AML and T-ALL (right).The ac�vity of mo�fs corresponding to the same TF was averaged.c Volcano plot showing the results of the TF footprin�ng analysis conducted with TOBIAS, aggrega�ng data from all CIMP pa�ents (n=9), AMLs (n=20) and T-ALL (n=20).The top 30 most significant results are highlighted and labelled.d Aggregated ATAC-seq signal across all mo�f occurrences of 4 TFs exhibi�ng significantly different footprints between CIMP and either AML (CEBPB, GATA3) or T-ALL (LEF1, PU.1

Supplementary Figure 9 .
CIMP leukemias exhibit dysregulated pa�erns of transcrip�on factor binding. a Tornado plot depic�ng the signal of three TFs (CEBPA, SPI1 and TCF7 from le� to right) in the 5000 most variable peaks in CIMP, AML and T-ALL (n=2 per group).The peaks are sorted based on total signal across all samples, from higher to lower.b Bar plots showing the number of differen�al peaks between CIMP and either AML or T-ALL for each TF (CEBPA, SPI1 and TCF7, le� to right).Only peaks with p-value < 0.01 and log2 (fold change) > 1 are shown.c Genomic tracks of selected loci with differen�al TF binding.Le�: binding of CEBPA is lost at a +14 kb enhancer of SPI1, middle: CIMP cases exhibit loss of SPI1 binding at a KLF4 enhancer, compared to AML; right: TCF7 binds the IL10RA in T-ALL, but not in CIMP leukemias.

Supplementary Figure 10 .Supplementary Figure 11 .
Addi�onal analyses of epigenomics data.a Box plots displaying open chroma�n levels (ATACseq) computed by DiffBind at enhancers and promoters for CIMP (n=9), AML (n=51) and T -ALL (n=19).The lower and upper edges of the boxplots represent the first and third quar�les, respec�vely; the horizontal line inside the box indicates the median.The whiskers extend to the most extreme values within the range comprised between the median and 1.5 �mes the interquar�le range.b Bar plot of differen�ally accessible regions in various supervised comparisons.A threshold of FDR < 0.05 and |log2 FC| > 0.5 was used to determine significance.c Bar plot showing the top results (10 highest and 10 lowest) from gene set enrichment analysis (GSEA) conducted on genes close to open chroma�n peaks, using the C2 collec�on.The ranking of genes was based on differen�al chroma�n accessibility for each comparison, i.e.CIMP vs AML (le�), CIMP vs T-ALL (middle) and CIMP vs CD34+ cells (right).d-f Same as a-c, but using H3K27ac ChIP-seq data instead in CIMP (n=9), AML (n=51) and T-ALL (n=19).g-i Same as a-c, but using H3K27me3 data instead in CIMP (n=5) and AML (n=22).the CTCF motif (MA0139.1,JASPAR2020) Percentage of motifs with CpG d Effects of methyla�on on CTCF binding in CIMP and other leukemias.a Venn diagram with overlap between the consensus lists of MCIP-seq peaks and CTCF ChIP-seq peaks.b Box plot displaying changes in methyla�on between CIMP and AML (le�) or T-ALL (right) at MCIP-seq peaks that either overlap with CTCF binding sites or do not.The lower and upper edges of the boxplots represent the first and third quar�les, respec�vely; the horizontal line inside the box indicates the median.The whiskers extend to the most extreme values within the range comprised between the median and 1.5 �mes the interquar�le range.c Number of CTCF binding sites that are gained (Gain), lost with hypermethyla�on (Lost_met), lost without change in methyla�on (Loss_nomet) or unchanged in comparisons between CIMP and AML.Only regions with data from both -CTCF ChIP-seq and MCIP-seq are considered.d Average frequency of CpG dinucleo�des in all detected CTCF binding sites at every posi�on of the CTCF mo�f (MA0139.1,JASPAR database7).e Same as d, but average CpG frequencies are calculated for the frac�ons of differen�al CTCF peaks described in c.Sample sizes for CTCF ChIP-seq in CIMP, AML and T-ALL were 9, 10 and 19 respec�vely.

Supplementary Figure 12 .
Differences in 3D genome organiza�on between leukemia groups.Differen�al analyses of Hi-C data in panels d-g were performed between CIMP (n=8) and either AML (n=5) or T-ALL (n=4).a Bar plots depic�ng the percentage of differen�al CTCF peaks that overlap with TAD boundaries or loop anchors.b UMAP plot of TAD inclusion ra�os (IR) calculated by HOMER in Hi-C data.c UMAP plot of loop density scores calculated by HOMER in Hi-C data.d Distribu�on of gains or losses in CTCF binding in variable TADs (top) or differen�al interac�ons (bo�om) when comparing CIMP vs T-ALL.e Box plot showing the change in CTCF binding (expressed as log2) at gained or lost differen�al interac�ons between CIMP and AML (le�) or T-ALL (right).The lower and upper edges of the boxplots represent the first and third quar�les, respec�vely; the horizontal line indicates the median.The whiskers extend to the most extreme values within the range between the median and 1.5 �mes the interquar�le range.For the CTCF ChIP-seq analysis, we used 9 CIMP, 10 AML and 19 T-ALL.f Merged Hi-C contact map of the IRF8 locus, comparing interac�ons between the CIMP (uppermost triangle, n=5) and AML groups (bo�om triangle, n=5

13 .
Addi�onal figures for discussion.a-c Merged Hi-C contact maps of the CEBPA locus, comparing interac�ons between CIMP and AML (a), CIMP and T-ALL (b), AML and T-ALL (c).The CEBPA TAD is indicated as a black square.Underneath, all loops detected in this region are shown in black, if they are invariable across condi�ons, and in green or red if they are gained or lost, respec�vely.The tracks below display aggregated MCIP-seq, CTCF ChIP-seq and H3K27ac ChIP-seq data (n=4 each).Peaks gained are highlighted in turquoise, whereas lost peaks are highlighted in light brown.The last track shows p300 binding measured by ChIP-seq in the K562 cell line.d Proposed mechanism of preferen�al hypermethyla�on at H3K27ac-marked regions.In CIMP leukemias, lack of DNMT3L and loss of H3K4me3 at bivalent regions enables the binding of DNMT3 proteins, recruited by EZH2.Moreover, the lack of TET2 prevents ac�ve demethyla�on.e Diagram summarizing the epigene�c mechanisms described in this study leading to differen�a�on block in CIMP leukemias.Aberrant DNA hypermethyla�on represses transcrip�on of key hematopoie�c TFs (notably CEBPA) and disrupts the binding of TFs genome-wide.In par�cular, the loss of CTCF binding at TAD boundaries and loop anchors results in local 3D genome reorganiza�on, which is also influenced by silencing of TFs like LEF1 or KLF4, known to be involved in chroma�n remodelling.Altogether, these processes rewire regulatory networks driving hematopoie�c differen�a�on in early progenitors, leaving them unable to fully commit to either the lymphoid or the myeloid lineage.

Supplementary Figure 2. Epigene�c and transcrip�onal landscape of CIMP, AML, T-ALL and CD34+ cells. a
Coverage of CpG islands, shores, shelves and inter-CGI by MCIP-seq data, compared to an equal set of randomly selected regions.The plot shows enrichment at CpG-rich regions and deple�on at inter-CGI regions rela�ve to the random set.b Func�onal annota�on of methylated regions detected by MCIP-seq.c Pearson correla�on heatmap of MCIP-seq data from AML (n=50) and CIMP (n=13) cases.d MCIP-seq data dimensionality reduc�on separates CIMP from AML and CD34+ HSPCs using Principal Component Analysis (PCA, le�) and Dimensionality reduc�on with either PCA (le�) or UMAP (right) of MCIP-seq data, comparing AML (n=50), CIMP (n=13) and T-ALL (n=14).Relevant AML subgroups known to exhibit dis�nct pa�erns of gene expression are colored: inv(16), t(8;21), t(9;22), t(15;17), 11q23 (MLL) rearrangements [t(11q23)], CEBPA double muta�ons (CEBPA_DM).ETP-ALL cases iden�fied on the basis of their immunophenotype are also labelled.b UMAP dimensionality reduc�on of Methyla�onEPIC array data from CIMP leukemias (n=5) in combina�on with previously published AML (n=272) and T-ALL (n=143) data.The T-ALL samples are labelled as follows: DNA methyla�on-driven clusters iden�fied by Touzart et al. 1 , where C1 has the lowest methyla�on and C5 the highest (le�); CIMP-posi�ve or CIMP-nega�ve, also based on global methyla�on levels (middle); ETP or non-ETP immunophenotype (right).c Dimensionality reduc�on with either PCA or UMAP of various types of epigenomics data in AML, CIMP, T-ALL and CD34+ HSPCs.Relevant AML subgroups known to exhibit dis�nct pa�erns of gene expression are colored (see above), as well as ETP-ALL cases.d-g.Pearson correla�on heatmaps of the data shown in c, with dendrograms indica�ng hierarchical clustering.Column names contain the pa�ent iden�fier, except in the gene expression heatmap (e), where the high number of cases would compromise legibility.
AML subgroupsSupplementary Figure1.MCIP-seq data analysis.a ). ΔTADs are highlighted as squares, colored in green if insula�on is gained or in red if insula�on is lost; DIs are indicated with black circles.Underneath, all loops detected in this region are shown in black, if they are invariable across condi�ons, and in green or red if they are gained or lost in CIMP rela�ve to AML, respec�vely.The tracks below display MCIPseq, CTCF ChIP-seq and H3K27ac ChIP-seq from CIMP and AML (n=4).Peaks gained in CIMP are highlighted in turquoise, whereas lost peaks are highlighted in light brown.The last track shows p300 binding measured by ChIP-seq in the K562 cell line.